Problem Note 38368: Incorrect probabilities might be generated from bagging models
In SAS® Enterprise Miner™, bagged models are created by iteratively modeling re-sampled observations. The model from each iteration contributes to the final total score. Some of the lines in the Flow code of the modeling node assign values from the first iteration. However, those lines are incorrectly missing from the Score code so that the model from the first iteration does not contribute to the total score.
There are no error or warning messages to indicate a problem. The only way to tell that the problem occurred is to inspect the created Score code and compare it to the Flow code. There is no workaround.
Example: suppose that your target variable is called TARGET and that it takes the values "yes" and "no." The following lines appear just past the end of the first iteration in the Flow code:
*--------------------------------------------------------------*;
*
Bagging: Saving Probabilities for Next Iteration;
*---------------------------------------------------------------*;
EndGrp_P_TARGETyes = P_TARGETyes;
EndGrp_P_TARGETno = P_TARGETno;
These lines should also appear in the Score code. However, they do not. These lines assign the predicted probability from the first iteration model to variables of the form EndGrp_P_<varname><varlevel>.
Following each iteration, the code creates a weighted average of the value of this Endgrp_P_ variable with average predicted probabilities from latter iterations to create the predicted probability from bagging.
Operating System and Release Information
SAS System | SAS Enterprise Miner | Microsoft® Windows® for 64-Bit Itanium-based Systems | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows XP 64-bit Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows 2000 Advanced Server | 5.3 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Datacenter Server | 5.3 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Server | 5.3 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Professional | 5.3 | | 9.1 TS1M3 SP4 | |
Microsoft Windows Server 2003 Datacenter Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows Server 2003 Enterprise Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows Server 2003 Standard Edition | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Microsoft Windows XP Professional | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
64-bit Enabled AIX | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
64-bit Enabled HP-UX | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
64-bit Enabled Solaris | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
HP-UX IPF | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Linux | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Linux on Itanium | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Solaris for x64 | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
Tru64 UNIX | 5.3 | 6.2 | 9.1 TS1M3 SP4 | 9.2 TS2M3 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | alert |
Date Modified: | 2010-02-04 13:07:43 |
Date Created: | 2010-01-11 16:47:55 |